arXiv: 1501.05192v2 [cs.CV] 23 Jan 2015 


A Graph Theoretic Approach for Object Shape 
Representation in Compositional Hierarchies 
using a Hybrid Generative-Descriptive Model 


Umit Rusen Aktas*, Mete Ozay*, Ales Leonardis and Jeremy L. Wyatt 


School of Computer Science, The University of Birmingham, Edgbaston, 
Birmingham, B15 2TT, United Kingdom. 

Emails: { u.aktas, m.ozay, a.Leonardis, j.l.wyatt } @cs.bham.ac.uk 


Abstract. A graph theoretic approach is proposed for object shape rep¬ 
resentation in a hierarchical compositional architecture called Composi¬ 
tional Hierarchy of Parts (CHOP). In the proposed approach, vocabu¬ 
lary learning is performed using a hybrid generative-descriptive model. 
First, statistical relationships between parts are learned using a Mini¬ 
mum Conditional Entropy Clustering algorithm. Then, selection of de¬ 
scriptive parts is defined as a frequent subgraph discovery problem, 
and solved using a Minimum Description Length (MDL) principle. Fi¬ 
nally, part compositions are constructed by compressing the internal data 
representation with discovered substructures. Shape representation and 
computational complexity properties of the proposed approach and algo¬ 
rithms are examined using six benchmark two-dimensional shape image 
datasets. Experiments show that CHOP can employ part shareability 
and indexing mechanisms for fast inference of part compositions using 
learned shape vocabularies. Additionally, CHOP provides better shape 
retrieval performance than the state-of-the-art shape retrieval methods. 


1 Introduction 

Hierarchical compositional architectures have been studied in the literature as 
representations for object detection [7], categorization |ioimi2i| and parsing [25]. 
A detailed review of the recent works is given in [26] . In this paper, we propose a 
graph theoretic approach for object shape representation in a hierarchical com¬ 
positional architecture, called Compositional Hierarchy of Parts (CHOP), using 
a hybrid generative-descriptive model. CHOP enables us to measure and employ 
generative and descriptive properties of parts for the inference of part compo¬ 
sitions in a graph theoretic framework considering part shareability, indexing 
and matching mechanisms. We learn a compositional vocabulary of shape parts 
considering not just their statistical relationships but also their shape descrip¬ 
tion properties to generate object shapes. In addition, we take advantage of 
integrated models for utilization of part shareability in order to construct dense 
representations of shapes in learned vocabularies for fast indexing and matching. 

The first and second author contributed equally. 
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Fig. 1: Overview of Compositional Hierarchy of Parts (CHOP) algorithm. 


A diagram demonstrating the overall structure of learning in CHOP is given 
in Fig. [l] At the first layer l = 1 of CHOP, we extract Gabor features from a 
given set of images (Feature Extraction). We employ non-maxima suppression 
among Gabor feature maps in order to get local response peaks. We define parts 
as random graphs and represent part realizations as the instances of random 
graphs observed on in some dataset. At each consecutive layer, l > 1, we first 
learn the statistical relationships between parts using a Minimum Conditional 
Entropy Clustering (MCEC) algorithm [16j measuring conditional distributions 
of part realizations. For this purpose, we compute the statistical relationship 
between two parts Pi and Pj by their measuring the co-occurence statistics, for 
all parts represented in a learned vocabulary, and for all realizations observed 
on images. Using the learned statistical and spatial relationships, we encode the 
input data in object graphs , where the nodes are part realizations, and edges 
encode discrete pairwise spatial relations (Object Graphs Generation). Next, 
we obtain compositions (parts) of the next layer by solving a frequent subgraph 
discovery problem. Each candidate subgraph (composition) is evaluated based on 
its ability to compress the object graphs, according to the Minimum Description 
Length (MDL) principle (Subgraph Discovery). Finally, part realizations for the 
next layer are located by compressing the object graphs using the discovered 
structures (Object Graphs Compression). The steps are recursively employed 
until no new compositions are inferred. 

The paper is organised as follows. Related work and the contributions of the 
paper is summarized in the next section. The proposed Compositional Hierarchy 
of Parts (CHOP) algorithm is given in Section[3] Experimental analyses are given 
in Section |4j and Section [5] concludes the paper. 
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2 Related Work and Contribution 

In El and he shape models are learned using hierarchical shape matching algo¬ 
rithms. Kokkinos and Yuille m first decompose object categories into parts 
and shape contours using a top-down approach. Then, they employ a Mul¬ 
tiple Instance Learning algorithm to discriminatively learn the shape models 
using a bottom-up approach. However, part-shareability and indexing mecha¬ 
nisms El are not employed and considered as future work in m ■ Fidler, Boben 
and Leonardis m analyzed crucial properties of hierarchical compositional ap¬ 
proaches that should be invoked by the proposed architectures. Following their 
analyses, we develop an unsupervised generative-descriptive model for learning 
a vocabulary of parts considering part-shareability, and performing efficient in¬ 
ference of object shapes on test images using an indexing and matching method. 

Fidler and Leonardis proposed a hierarchical architecture, called Learned 
Hierarchy of Parts (LHOP), for compositional representation of parts fTO] , The 
main difference between LHOP and the proposed CHOP is that CHOP employs 
a hybrid generative-descriptive model for learning shape vocabularies using in¬ 
formation theoretic methods in a graph theoretic framework. Specifically, CHOP 
first learns statistical relationships between varying number of parts, i.e. compo¬ 
sitions of LGparts instead of the two-part compositions called (duplets) used in 
LHOP [TOUT] . Second, shape descriptive properties of parts are integrated with 
their statistical properties for inference of part compositions. In addition, the 
number of layers in the hierarchy are not pre-defined but determined in CHOP 
according to the statistical properties of the data. 

MDL models have been employed for statistical shape analysis [5124] , specif¬ 
ically to achieve compactness, specificity and generalization ability properties 
of shape models [5j and segmentation algorithms j6]. We employ MDL for the 
discovery of compositions of shape parts considering the statistical relationships 
between the parts, recursively in a hierarchical architecture. Hybrid generative- 
descriptive models have been used in [12] by employing Markov Random Fields 
and component analysis algorithms to construct descriptive and generative mod¬ 
els, respectively. Although their proposed approach is hierarchical, they do not 
learn compositional vocabularies of parts for shape representation. 

Although our primary motivation is constructing a hierarchical compositional 
model for shape representation, we also examined the proposed algorithms for 
shape retrieval in the Experiments section. For this purpose, we compare the sim¬ 
ilarity between shapes using discriminative information about shape structures 
extracted from a learned vocabulary of parts and their realizations. Theoretical 
and experimental results of (20122123] on spectral properties of isomorphic graphs 
show that the eigenvalues of the adjacency matrices of two isomorphic graphs are 
ordered in an interval, and therefore provide useful information for discrimina¬ 
tion of graphs. Assuming that shapes of the objects belonging to a category are 
represented ( approximately ) by isomorphic graphs, we can obtain discriminative 
information about the shape structures by analyzing spectral properties of the 
part realizations detected on the shapes. 
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Our contributions in this work are threefold: 

1. We introduce a graph theoretic approach to represent objects and parts 
in compositional hierarchies. Unlike other hierarchical methods mm , 
CHOP learns shape vocabularies using a hybrid generative-descriptive model 
within a graph-based hierarchical compositional framework. The proposed 
approach uses graph theoretic tools to analyze, measure and employ geomet¬ 
ric and statistical properties of parts to infer part compositions. 

2. Two information theoretic methods are employed in the proposed CHOP 
algorithm to learn the statistical properties of parts, and construct compo¬ 
sitions of parts. First we learn the relationship between parts using MCEC 
jTB] . Then, we select and infer compositions of parts according to their shape 
description properties defined by an MDL model. 

3. CHOP employs a hybrid generative-descriptive model for hierarchical com¬ 
positional representation of shapes. The proposed model differs from frequency- 
based approaches in that the part selection process is driven by the MDL 
principle, which effectively selects parts that are both frequently observed 
and provide descriptive information for the representation of shapes. 


3 Compositional Hierarchy of Parts 


In this section, we give the descriptions of the algorithms employed in CHOP 
in its training and testing phases. In the next section, we first describe the 
preprocessing algorithms that are used in both training and testing. Next, we 
introduce the vocabulary learning algorithms in Section 3.2 Then, we describe 


the inference algorithms performed on the test images in Section 3.3 


3.1 Preprocessing 

Given a set of images S = {s n: y n }^ =1 , where y n e Z + is the category label of 
an image s n , we first extract a set of Gabor features F n - {/ n m( x nm) G ^}m=i 

from each image s n using Gabor filters employed at location x nm in s n at 0 

n 

orientations [TO] . Then, we construct a set of Gabor features F = U F n . In this 

n= 1 

work, we compute the Gabor features at 0 = 6 different orientations. In order to 
remove the redundancy of Gabor features, we perform non-maxima suppression. 
In this step, a Gabor feature with the Gabor response value / nm (x nm ) is removed 
from F n if f nm ( x nm ) ^ /na( x na)i for all Gabor features extracted at x na c 
^( x nm), where K(x nm ) is a set of image positions of the Gabor features that 

reside in the neighborhood of x nm defined by Euclidean distance in M 2 . Finally, 

„ „ N „ 

we obtain a set of suppressed Gabor features F n c F n and F = (J F n . 

n=1 


3.2 Learning a Vocabulary of Parts 

Given a set of training images S tr , we first learn the statistical properties of 
parts using their realizations on images at a layer /. Then, we infer the compo¬ 
sitions of parts at layer Z + 1 by minimizing the description length of the object 
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descriptions defined as Object Graphs. In order to remove the redundancy of 
the compositions, we employ a local inhibition process that was suggested in 
m- Statistical learning of part structures, inference of compositions and local 
inhibition processes are performed by constructing compositions of parts at each 
layer, recursively, and the details are given in the following subsections. 

Definition 1 (Parts and Part Realizations). 

The i th part constructed at the I th layer V\ = is a tuple consisting of 

a directed random graph Q\ - (V-, £\), where V\ is a set of nodes and £\ is a set of 
edges, and y\ eZ + is a random variable which represents the identity number or 
label of the part. The realization R\(s n ) = (G\(s n )(s n )) ofV\ is defined by 1) 
Y^(s n ) which is the realization ofy\ representing the label of the part realization 
on an image (s n ), and 2) the directed graph G\(s n ) = (y/(s n ),T|(s n )} which 
is an instance of the random graph Q\ computed on a training image (s n ) g S tr , 
where V^{s n ) is a set of nodes and E\(s n ) is a set of edges of G\(s n ), Vn = 
1,2,..., N tr . 

At the first layer l = 1, each node ofVj is a part label y] g V/ taking values 
from the set {1,2, and £} = 0. Similarly, Ej(s n ) = 0, and each node 

of Vjf (s n ) is defined as a Gabor feature /^ a (x na ) g Fff observed in the image 
s n g S tr at the image location x na; i.e. the a th realization of V\ observed in 
s n e S tr at x na; Vn = 1,2,..., N tr . In the consecutive layers, the parts and 
part realizations are defined recursively by employing layer-wise mappings ^ig+% 
defined as 

9i,i + 1 : (V l ,R l ,GO - (V l+ \R l+1 ),Vl = 1,2,...,L, (1) 

where V 1 = R l = {#■<>«) : s n 6 S tr }l\, V M = , R l+1 = 

{i?j +1 (s n ) : s n e S"-)F an d G/ is an object graph which is defined next. □ 

In the rest of this section, we will use R l ,(s n ) = Rj, Vj = 1,2,.... B/, V/ = 
1,2,..., L, V<s n g S tr , for the sake of simplicity in the notation. 

Definition 2 (Receptive and Object Graph). 

A receptive graph of a part realization 1Z\ is a star-shaped graph RG\ = (V^, E\), 
which is induced from a receptive field centered at the root node 1Z\. A directed 
edge e^ g E\ is defined as 

= \{a l ,b l ,4> l ab ), if x nb e K(x na ),a = « 

ab ^0, otherwise 

where K(x na ) is the set of part realizations that reside in a neighborhood of a part 
realization R l a in an image s n , VR l a ,R l b ,b ± i and Vs n g S tr . (f l ab defines the 

statistical relationship between R l a and R l b , as explained in the next subsection. 

The structure of part realizations observed at the I th layer on the training 
set S tr is described using a directed graph G/ = (¥/,E/) ; called an object graph, 
where Vj = U V\ is a set of nodes, and E/ = IJ Ej E a set of edges, where Vi and 

i i 

Ei is the set of nodes and edges of a receptive graph RGi, Mi, respectively. □ 
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Learning of Statistical Relationships between Parts and Part Re¬ 
alizations We compute the conditional distributions P v i(R l a \V l j = R l b ) for each 
i = Y l a and j = Y b between all possible pairs of parts (V^Vj) using S tr at the 
I th layer. However, we select a set of modes M 1 = {M^- : i = 1, 2,..., j = 
1,2,... , F^}, where of these distributions instead of detecting 

a single mode. For this purpose, we define the mode computation problem as a 
Minimum Conditional Entropy Clustering problem [16: as 


Zijk ■= argminiJ(7T ft ,^|i?b), 

7Tfc€C 


( 3 ) 


H(7r k ,R l a \R l b ) = - £ f] P(n k ,R l a \R l b )logP(ir k ,R l a \R l b ). (4) 

Vx^eKfx 1 , ) k=1 

The first summation is over all part realizations R l a that reside in a neighborhood 
of all R l b such that x. l na e tt(x^ 6 ), for all i = Y l a and j = Y b , C is a set of 
cluster ids, K = \C\ is the number of clusters, 7T& e C is a cluster label, and 

P(7T fe Xl^)=^K^Xl^=^)- 

The pairwise statistical relationship between two part realizations R l a and 
R l b is represented as - (i, j, c^, Z^), where c ^ is the center position of 
the k th cluster. In the construction of an object graph G i at the I th layer, we 
compute (j) l ab = ( c ijk ,k ), Va,6 as k = argmin^ \\d ab -cijkh, where || • || 2 is the 
Euclidean distance, i = Y l a and j = Y b , d a ^ = x na - x n &, x na and x n ^ are the 
positions of R l a and R l j in an image s n , respectively. 

Inference of Compositions of Parts using MDL Given a set of parts 
V\ a set of part realizations 7 Z\ and an object graph G i at the I th layer, we infer 
compositions of parts at the (/ + l) st layer by computing a mapping Piy+i in ([l]). 
In this mapping, we search for a structure which best describes the structure of 
parts V 1 as the compositions constructed at the (/ + l) st layer by minimizing the 
length of description of V 1 . In the inference process, we search a set of graphs 
Q l+1 - which minimizes the description length of G/ as 


g l +l 


arg min value (Q 1 * 1 , Gj), 

G l+1 

o 


( 5 ) 


where 


value(G l j +1 ,Gj) 


DLiGf 1 ) + DL(Gi\g^ +1 ) 
DL(Gi) 


(6) 


is the compression value of an object graph G i given a subgraph Q l :j + [ of a re¬ 
ceptive graph RG l j, Vj = 1,2, Description length DL of a graph G is 

calculated using the number of bits to represent node labels, edge labels and ad¬ 
jacency matrix, as explained in [3]. The inference process consists of two steps: 
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(a) g [ +1 (Valid) (b) g l 2 +1 (Valid) (c) g l 3 +1 (Invalid) (d) g { +1 (Invalid) 

Fig. 2: Valid and invalid candidates. 

1. Enumeration: In the graph enumeration step, candidate graphs g l+1 are 
generated from G/. However, each g j +1 e G 1 is required to include nodes 
Vj +1 and edges £j +1 from only one receptive graph i7G-, Vi. This selective 
candidate generation procedure enforces C/j +1 to represent an area around 
its centre node. Examples of valid and invalid candidates are illustrated in 
Fig. i g [ +1 and g l 2 +1 are valid structures since each graph is inferred from 
a single receptive graph, e.g. RG[ and RG l 2 , respectively. Invalid graphs 
g l 3 +1 and g 4 +1 are not enumerated since their nodes/edges are inferred from 
multiple receptive graphs. 

2. Evaluation: Once we obtain g l+1 by solving with g l+1 subject to con¬ 
straints provided in the previous step, we compute a set of graph instances of 
part realizations G l+1 = {G 1 * 1 }^ 1 such that G- +1 € iso(g l j +1 ) and G- +1 c G/, 
where iso(g l j +1 ) is a set of all subgraphs that are isomorphic to Gj +1 - This 
problem is defined as a subgraph isomorphism problem [4], which is NP- 
complete. In this work, the proposed graph structures are acyclic and star¬ 
shaped, enabling us to solve ^ in P-time. In order to obtain two sets of 
subgraphs g l+1 and G l+1 by solving ©> we have implemented a simplified 
version of the substructure discovery system, SUBDUE [4j which is em¬ 
ployed in a restricted search space. The discovery algorithm is explained in 
Algorithm [l] The key difference between the original SUBDUE and our im¬ 
plementation is that in Step 4, childList contains only star-shaped graphs, 
which are extended from parent List by single nodes. The parameters beam , 
numBest , bestPartSize are used to prune the search space. 

The label of a part V l j +1 is defined according to its compression value fi 1 ^ 1 = 
value(g l j +1 , Gi) computed in ©• We sort compression values in ascending order, 
and assign the part label to the index of the compression value of the part. 

After sets of graphs and part labels are obtained at the (/ + l) st layer, we 
construct a set of parts V l+1 = , where V\ +l = {G \ +1 ,y\ +1 ) • We call 

V l+1 a set of compositions of the parts from V 1 , constructed at the (Z + l) st 
layer. Similarly, we extract a set of part realizations ii^ +1 = {R 1 ^ 1 , where 

R l 3 +1 = (G^ +1 ,Yj +1 ). In order to remove the redundancy in R l+1 , we perform local 
inhibition as in m and obtain a new set of part realizations R l+1 <= R l+1 . 
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Input : G i = (Vz,Ej): Object graph, beam , numBest , bestPartSize. 

Output: Parts 7^ +1 , realizations 1Z 1+1 . 

parentList := null ; childList := null; bestPartList := null ; 

where childList,best Part List are priority queues ordered by MDL scores. 

Initialize parentList with frequent single node parts; 

while parentList is not empty do 

Extend parts in parentList in all possible ways into childList ; 
Evaluate parts in childList using ([6]); 

Trim childList to beam top parts; 

Merge elements of childList and bestPartList into bestPartList ; 
parentList := null ; 

Swap parentList and childList ; 

end 

Trim bestPartList to maxBest top parts; 

P z+1 := bestPartList ; 

7^ +1 := bestPartList.getlnstances(f)\ 


Algorithm 1: Inference of new compositions. 


Incremental Construction of the Vocabulary 

Definition 3 (Vocabulary). A tuple Qi = ( V l ,M l ) is the vocabulary con¬ 
structed at the I th layer using the training set S tr . The vocabulary of a CHOP 
with L layers is defined as the set Q - {!?/ : l = 1,2,. ,., L}. □ 

We construct Q of CHOP incrementally as described in the pseudo-code 
of the vocabulary learning algorithm given in Algorithm [2] In the first step of 
the algorithm, we extract a set of Gabor features F n = {/ n m( x nm)}m=i f rom 
each image s n € S tr using Gabor filters employed at location x nm in s n at O 
orientations. Then, we perform local inhibition of Gabor features using non¬ 
maxima suppression to construct a set of suppressed Gabor features F n c F n 
as described in Section [3TT| in the second step. Next, we initialize the variable / 
which defines the layer index, and we construct parts V 1 and part realizations 
R 1 at the first layer as described in Definition [l] 

In steps 5-11, we incrementally construct the vocabulary of the CHOP. In 
step 5, we compute the sets of modes M 1 by learning statistical relationships be¬ 
tween part realizations as described in Section [372] In the sixth step, we construct 
an object graph G i using Mf as explained in Definition [ 2 J and we construct the 
vocabulary Qi - (fP l ,A4 l ) at the I th layer in step 7. Next, we infer part graphs 
that will be constructed at the next layer Q l+1 by computing the mapping Fiy+i. 
For this purpose, we solve using our graph mining implementation to obtain 
a set of parts V l+1 and a set of part realizations R l+1 as explained in Section 
|3.2| We increment l in step 10 , and subsample the positions of part realizations 
R'i by a factor of cr, Mn,R\ in step 11 , which effectively increases the area of the 
receptive fields through upper layers. We iterate the steps 5-11 while a non¬ 
empty part graph Q\ is either obtained from the training images at the first layer, 
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Input 

— S tr = {sn}n=i: Training dataset, 

— 0: The number of different orientations of Gabor features, 

— a: Subsampling ratio. 

Output: Vocabulary Q. 

N 

1 Extract a set of Gabor features F tr = (J F* r , where F^ r = {/nm(x n m)}m=i from 

n =1 

each image s n e S tr ; 

2 Construct a set of suppressed Gabor features F tr c F tr (see Section 3.1); 

3 l := 1; 


4 Construct V 1 and R 1 (see Definition [lj); 


5 

6 

7 

8 
9 

10 

11 


while Q t 0 do 

Compute the sets of modes M l (see Section |ff2| ) ; 

Construct G i using M l (see Definition [ 2 ]); 

Construct i ?1 = 

Infer part graphs Q l+1 by solving © (see Section 
Construct V l+1 and R l+1 (see Section 
l:=l + 1; 

Subsample the positions of part realizations R\ by a factor of cr, Vn, R\\ 


3.2); 


end 

12 i? = t = 1, 2,..., Z — 1}; 


Algorithm 2: The vocabulary learning algorithm of Compositional Hier¬ 
archy of Parts. 


or inferred from Qi- 1 , R l 1 and G/_i at l > 1, i.e. ^ 0, V/ > 1. As the output 
of the algorithm, we obtain the vocabulary of CHOP, i? = {i2;:Z = l,2,...,L}. 


3.3 Inference of Object Shapes on Test Images 


In the testing phase, we infer shapes of objects on test images s n e S te us¬ 
ing the learned vocabulary of parts Q. The algorithm flow of our inference 
algorithm resembles that of learning, as shown in Fig. [3j The only difference 
between the learning and inference processes is that no new subgraphs are dis¬ 
covered from the input image in inference, but learned compositions are matched 
to their instances (Subgraph Matching). Algorithm [ 3 ] explains the inference al¬ 
gorithm for test images. We incrementally construct a set of inference graphs 
T(s n ) = {'Tl( s n)}i J =i of a given test image s n e S te using the learned vocab¬ 
ulary Q - At each I th layer, we construct a set of part realizations 


R l (s n ) = |-R-( s n) = ( G i( s n),y i (sn))}._ i and an object graph G/ = (Y;,E t ) of 
s n , V/ = 1,2,...,L. The test image is processed in the same manner as in vocab¬ 
ulary learning (steps 1-5). In step 6, isomorphisms of part graph d escri ptions 
Q l+1 obtained from fii+i are searched in G/ in P-time (see Section 3.2). Part 
realizations R l+1 of the new object graph G/+i are extracted from G +i in step 
7. The discovery process continues until no new realizations are found. 
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Feature Extraction 


Input Image Vocabulary Gabor 

Object 

level 1 responses 

Graph 


Generation 

tod "=o 

t=0 




Indexing to Higher Layers 


Object graph 


Q 


Subgraph Vocabulary Object Graph Object graph 
Matching level 2 compression level 2 


DO 



Vocabulary 
level 3 


H H 


Object graph Vocabulary 
level L-1 level L 



Fig. 3: Inference in Compositional Hierarchy of Parts (CHOP) framework. 


At the first layer l = 1, the nodes of the instance graph Gj(s n ) of a part 
realization Rj(s n ) represent the Gabor features /^ a (x na ) € F^ e obse rved in 
the image s n e S te at an image location x na as described in Section 3.2 In 


order to infer the graph instances and compositions of part realizations in the 
following layers 1 < / < L, we employ a graph matching algorithm that constructs 
G\ +1 (s n ) = {H(V l+1 ) : H(V l+1 ) <= Gi} which is a set of subgraph isomorphisms 
H(V l+1 ) of part graphs Q l+1 in V l+1 , computed in G/. 


Input 

— s : Test image, 

— !2: Vocabulary, 

— 0: The number of different orientations of Gabor features, 

— a: Subsampling ratio. 


Output: Inference graph T(s). 

Extract a set of Gabor features F = {/ m (x m )}^f =1 from image s; 
Construct a set of suppressed Gabor features F c F (see Section 3.1); 

F.= l- 

Construct R 1 from F (see Definition [l]); 
while f?i+ 1 ± 0 A R l ± 0 do 
Construct G i using M l in 

Find graph instances of part realizations G l+1 = {G l 3 +1 }^ = [ +1 such that 

Evaluation); 


(see Section 


3.2 


Gy 1 eiso(g L+1 ) and G) +1 c 
Construct R l+1 from G l+1 (see Section 3.2); 
l:=l + 1 ; 

Subsample the positions of part realizations R\ by a factor of cr, VR-; 


end 

10 T(s) = {Gt : t ■ 


1,2 


Algorithm 3: Object shape inference algorithm for test images. 
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Fig. 4: Analyses with different number of categories. (Best viewed in colour) 


4 Experiments 

We examine our proposed approach and algorithms on six benchmark object 
shape datasets, which are namely the Washington image dataset (Washington) 
PQ, the MPEG-7 Core Experiment CE-Shape 1 dataset [14], the ETHZ Shape 
Classes dataset [9], 40 sample articulated Tools dataset (Tools-40) [17], 35 sam¬ 
ple multi-class Tools dataset (Tools-35) 0 and the Myth dataset [2]. In the 
experiments, we used 0 = 6 different orientations of Gabor features with the 
same Gabor kernel parameters implemented in m ■ We used a subsampling 
ratio of a = 0.5. A Matlab implementation of CHOP is available her^] Addi¬ 
tional analyses related to part shareability and qualitative results are given in 
the Supplementary Material. 


4.1 Analysis of Generative and Descriptive Properties 

We analyze the relationship between the number of classes, views, objects, and 
vocabulary size, average MDL values and test inference time in three different 
setups, respectively. Vocabulary size and test inference time analyses provide 
information about the part shareability and generative shape representation be¬ 
havior of our algorithm (The inference time of CHOP is the average inference 
time on test images). We examine the variations of the average MDL values un¬ 
der different test sets. In order to get a more descriptive estimate of MDL values, 
we use 10 best parts constructed at each layer of CHOP. While a vocabulary 
layer may contain thousands of parts, most of the parts constructed with the 
lowest MDL scores belong to a single object in the model, and therefore exhibit 
no shareability. 

Analyses with Different Number of Categories In this section we use 
the first 30 categories of the MPEG-7 Core Experiment CE-Shape 1 dataset [14] . 
We randomly select 5 images from each category to construct training sets. 

The vocabulary size grows sub-linearly as shown with the blue line in Fig. 0a- 
The higher part shareability observed in the first layers of CHOP is considered as 


1 https://github.com/rusen/CHOP.git 
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Fig. 5: Analyses with different number of objects. (Best viewed in colour) 


the main contributing factor which affects the vocabulary size. We observe a sub- 
linear growth of the number of parts as the number of categories increases, which 
affects the test image inference time as shown in Fig.|4jc. This is observed because 
the inference process requires searching every composition in the vocabulary 
within the graph representation of a test image. The efficient indexing mechanism 
implemented in CHOP speeds up the testing time, and the average test time is 
calculated as 0.5-3 seconds depending on the number of categories. Average 
MDL values tend to increase after a boost at around 3-4 categories (lower is 
better), and converge at 15 categories. The inter-class appearance differences 
allow for a limited amount of shareability between categories. 


4.2 Analyses with Different Number of Objects 

In order to analyze the effect of increasing number of images to the proposed 
performance measures, we use 30 samples belonging to the ” Apple Logos” class 
in ETHZ Shape Classes dataset [9] for training. Compared to the results obtained 
in the previous section, we observe that average MDL values increase gradually 
as the number of objects increase in Fig. [5jb. Additionally, the growth rate of 
the vocabulary size observed in Fig.[5]a is less than the one depicted in Fig. [4ja. 


4.3 Analyses with Different Number of Views 

In the third set of experiments, we use a subset of Washington image dataset 
[1] consisting of images captured at different views of the same object. Multiple 
view images of a cup are used as the training data. Due to the fairly symmetrical 
nature of a cup except for its textures and handle, the shareability of the parts in 
the vocabulary remains consistent as the training image set grows. Interestingly, 
we observe a local maximum at around 15 views in Fig. [6jb. Depending on the 
inhibition and part selection (SUBDUE) parameters, less frequently observed 
yet valuable parts may be discarded by the algorithm in mid-layers. 
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Fig. 6 : Analyses with different number of views. 


(Best viewed in colour) 


4.4 Shape Retrieval Experiments 

Following the results of [ 20122123 ] . we employ eigenvalues of adjacency matri¬ 
ces of edge weighted graphs computed using object graphs of shapes as shape 
descriptors. For this purpose, we first define edge weights e a & e Ei of an edge 
weighted graph Wi = (V/, £ 7 ) of an object graph G i = (V/,E/) as 

ink, if Ra is connected to R^, VR l a ,R l b eYi 
ab | 0 , otherwise 


where 7Tk is the cluster index which minimizes the conditional entropy © in ©>• 
Then, we compute the weighted adjacency matrix of Wi and use the eigenval¬ 
ues as shape descriptors. We compute the distance between two shapes as the 
Euclidean distance between their shape descriptors. 

In the first set of experiments, we compare the retrieval performances of 
CHOP and the state-of-the-art shape classification algorithms which use inner- 
distance (ID) measures to compute shape descriptors which are robust to ar¬ 
ticulation m The experiments are performed on Tools-40 dataset PH which 
contains 40 images captured using 8 different objects each of which provides 5 
articulated shapes. Given each query image, the four most similar matches are 
chosen from the other images in the dataset for the evaluation of the recognition 
results m- The results are summarized as the number of first, second, third 
and fourth most similar matches that come from the correct object in Table 
[l] We observe that CHOP provides better performance than the shape-based 
descriptors and retrieval algorithms SC+DP and MDS+SC+DP m- However, 
IDSC+DP [IT!, which integrates texture information with the shape informa¬ 
tion, provides better performance for Top 1 retrieval results, and CHOP performs 
better than IDSC+DP for Top 4 retrieval results. The reason of this observation 
is that texture of shape structures provides discriminative information about 
shape categories. Therefore, the objects which have the most similar textures 
are closer to each other than the other objects as observed in Top 1 retrieval 
results. On the other hand, texture information may dominate the shape infor¬ 
mation and may lead to overfitting as observed in Top 4 retrieval results (see 
Table 0. 
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Table 1: Comparison of shape retrieval performances (%) on Tools-40 dataset. 


Algorithms 

Top 1 

Top 2 

Top 3 

Top 4 

SC+DP [13 
MDS+SC+DP [13 
IDSC+DP [HI 
CHOP 

20/40 

36/40 

40/40 

37/40 

10/40 

26/40 

34/40 

35/40 

11/40 

17/40 

35/40 

35/40 

5/40 

15/40 

27/40 

29/40 


Table 2: Comparison of shape retrieval performances (%) on Myth and Tools-35. 


Datasets 

Contour-ID [18] 

Contour-HF [18] 

CHOP 

Tools-35 

84.57 

84.57 

87.86 

Myth 

77.33 

90.67 

93.33 


In the second set of experiments, we use Myth and Tools-35 datasets in order 
to analyze the performance of the shape retrieval algorithms [18] and CHOP, con¬ 
sidering part shareability and category-wise articulation. In the Myth dataset, 
there are three categories, namely Centaur, Horse and Man, and 5 different im¬ 
ages belonging to 5 different objects in each category. Shapes observed in images 
differ by articulation and additional parts, e.g. the shapes of objects belonging 
to Centaur and Man categories share the upper part of the man body, and the 
shapes of objects belonging to Centaur and Horse categories share the lower 
part of the horse body. In the Tools-35 dataset, there are 35 shapes belonging 
to 4 categories which are split as 10 scissors, 15 pliers, 5 pincers, 5 knives. Each 
object belonging to a category differs by an articulation. Performance values are 
calculated using a Bullseye test as suggested in [18] to compare the performances 
of CHOP and other shape retrieval algorithms Contour-ID [18] and Contour-HF 
[18] . In the Bullseye test, five most similar candidates for each query image are 
considered m • Experimental results given in Table [2] show that CHOP outper¬ 
forms Contour-ID and Contour-HF [18] which employ distributions of descriptor 
values calculated at shape contours as shape features that are invariant to articu¬ 
lations and deformations in local part structures. However, part shareability and 
articulation properties of shapes may provide discriminative information about 
shape structures, especially on the images in the Myth dataset. 


5 Conclusion 

We have proposed a graph theoretic approach for object shape representation in a 
hierarchical compositional architecture called Compositional Hierarchy of Parts 
(CHOP). Two information theoretic algorithms are used for learning a vocab¬ 
ulary of compositional parts employing a hybrid generative-descriptive model. 
First, statistical relationships between parts are learned using the MCEC algo¬ 
rithm. Then, part selection problem is defined as a frequent subgraph discovery 
problem, and solved using an MDL principle. Part compositions are inferred con- 
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sidering both learned statistical relationships between parts and their description 
lengths at each layer of CHOP. 

The proposed approach and algorithms are examined using six benchmark 
shape datasets consisting of different images of an object captured at different 
viewpoints, and images of objects belonging to different categories. The results 
show that CHOP can use part shareability property in the construction of com¬ 
pact vocabularies and inference trees efficiently. For instance, we observe that 
the running time of CHOP to perform inference on test images is approximately 
0.5-3 seconds for an image. Additionally, we can construct compositional shape 
representations which provide part realizations that completely cover the shapes 
on the images. Finally, we compared shape retrieval performances of CHOP and 
the state-of-the-art retrieval algorithms on three benchmark datasets. The results 
show that CHOP outperforms the evaluated algorithms using part shareability 
and fast inference of descriptive part compositions. 

In the future work, we will employ discriminative learning for pose estima¬ 
tion and categorization of shapes. In addition, online and incremental learning 
will be implemented considering the results obtained from the analyses on part 
shareability performed in this work. 
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